archived/huggingface-inference-recommender/huggingface-inference-recommender.ipynb

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# SageMaker Inference Recommender for HuggingFace BERT Sentiment Analysis\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n", "\n", "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-2/sagemaker-inference-recommender|huggingface-inference-recommender|huggingface-inference-recommender.ipynb)\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Contents\n", "[1. Introduction](#1.-Introduction) \n", "[2. Download the Model & payload](#2.-Download-the-Model-&-payload) \n", "[3. Machine Learning model details](#3.-Machine-Learning-model-details) \n", "[4. Register Model Version/Package](#4.-Register-Model-Version/Package) \n", "[5. Create a SageMaker Inference Recommender Default Job](#5:-Create-a-SageMaker-Inference-Recommender-Default-Job) \n", "[6. Instance Recommendation Results](#6.-Instance-Recommendation-Results) \n", "[7. Create an Endpoint for lowest latency real-time inference](#7.-Create-an-Endpoint-for-lowest-latency-real-time-inference) \n", "[8. Clean up](#8.-Clean-up) \n", "[9. Conclusion](#9.-Conclusion)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Introduction\n", "\n", "SageMaker Inference Recommender is a new capability of SageMaker that reduces the time required to get machine learning (ML) models in production by automating performance benchmarking and load testing models across SageMaker ML instances. You can use Inference Recommender to deploy your model to a real-time inference endpoint that delivers the best performance at the lowest cost. \n", "\n", "Get started with Inference Recommender on SageMaker in minutes while selecting an instance and get an optimized endpoint configuration in hours, eliminating weeks of manual testing and tuning time.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To begin, let's update the required packages i.e. SageMaker Python SDK, `boto3`, `botocore` and `awscli`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sys\n", "\n", "!{sys.executable} -m pip install sagemaker botocore boto3 awscli transformers accelerate --upgrade\n", "!pip install --upgrade pip awscli botocore boto3 --quiet" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you run this notebook in SageMaker Studio, you need to make sure `ipywidgets` is installed and restart the kernel, so please uncomment the code in the next cell, and run it.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# %%capture\n", "# import IPython\n", "# import sys\n", "\n", "!{sys.executable} -m pip install ipywidgets\n", "# IPython.Application.instance().kernel.do_shutdown(True) # has to restart kernel so changes are used" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Download a pre-trained Model\n", "\n", "In this example, we are using a `Huggingface` pre-trained `sentiment-analysis` model.\n", "\n", "You can learn more about it in the 🤗 Transformers library Quick tour: https://huggingface.co/docs/transformers/quicktour" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker import get_execution_role, Session, image_uris\n", "import pandas as pd\n", "import boto3\n", "import datetime\n", "import time\n", "import os\n", "\n", "region = boto3.Session().region_name\n", "role = get_execution_role()\n", "sagemaker_session = Session()\n", "\n", "print(region)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "export_dir = \"./model/\"\n", "\n", "if not os.path.exists(export_dir):\n", " os.makedirs(export_dir)\n", " print(\"Directory \", export_dir, \" Created \")\n", "else:\n", " print(\"Directory \", export_dir, \" already exists\")\n", "\n", "model_archive_name = \"hf-model.tar.gz\"\n", "payload_archive_name = \"hf_payload.tar.gz\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initiate a `Huggingface pipeline`\n", "\n", "The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. See the [task summary](https://huggingface.co/transformers/task_summary.html) for examples of use." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from transformers import pipeline\n", "\n", "sentiment_analysis = pipeline(\"sentiment-analysis\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Save the pre-trained model on file system" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sentiment_analysis.save_pretrained(\"./model\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Write the Inference Script\n", "\n", "To deploy a pretrained `PyTorch` model, you'll need to use the `PyTorch` estimator object to create a `PyTorchModel` object and set a different `entry_point`.\n", "\n", "You'll use the `PyTorchModel` object to deploy a `PyTorchPredictor`. This creates a `SageMaker` Endpoint -- a hosted prediction service that we can use to perform inference.\n", "\n", "An implementation of `model_fn` is required for inference script. We are going to use default implementations of `input_fn`, `predict_fn`, `output_fn` and `model_fn` defined in [sagemaker-pytorch-containers](https://github.com/aws/sagemaker-pytorch-containers).\n", "\n", "Here's an example of the inference script:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!cat code/inference.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can use a `requirements.txt` to add Python packages" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!cat code/requirements.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create the directory structure for your model files\n", "\n", "The directory structure where you saved your PyTorch model should look something like the following:\n", "\n", "```\n", "| model\n", "| |--pytorch_model.bin\n", "| |--config.json\n", "| |--vocab.txt\n", "| |--tokenizer.json\n", "| |--tokenizer_config.json\n", "| |--special_tokens_map.json\n", "|\n", "| code\n", "| |--inference.py\n", "| |--requirements.txt\n", "```\n", "\n", "Where `requirements.txt` is an optional file that specifies dependencies on third-party libraries." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's copy `code` directory into `model` directory to comply with the directory structure mentioned above." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!cp -r ./code/ ./model/" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!ls -rtlh ./model/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tar the model and code" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!cd model && tar -cvpzf ../{model_archive_name} *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tar the payload" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!cd ./sample-payload/ && tar czvf ../{payload_archive_name} *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Upload the model and payload to S3\n", "\n", "We now have a model archive and the payload ready. We need to upload it to S3 before we can use it with Inference Recommender, so we will use the SageMaker Python SDK to handle the upload.\n", "\n", "We need to create an archive that contains individual files that Inference Recommender can send to your SageMaker Endpoints. Inference Recommender will randomly sample files from this archive so make sure it contains a similar distribution of payloads you'd expect in production. Note that your inference code must be able to read in the file formats from the sample payload." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time\n", "\n", "import os\n", "import boto3\n", "import re\n", "import copy\n", "import time\n", "from time import gmtime, strftime\n", "import sagemaker\n", "from sagemaker import get_execution_role\n", "\n", "# S3 bucket for saving code and model artifacts.\n", "# Feel free to specify a different bucket and prefix\n", "bucket = sagemaker.Session().default_bucket()\n", "\n", "prefix = \"sagemaker/huggingface-pytorch-inference-recommender\"\n", "\n", "sample_payload_url = sagemaker.Session().upload_data(\n", " payload_archive_name, bucket=bucket, key_prefix=prefix + \"/inference\"\n", ")\n", "model_url = sagemaker.Session().upload_data(\n", " model_archive_name, bucket=bucket, key_prefix=prefix + \"/sentiment-analysis/model\"\n", ")\n", "\n", "\n", "print(sample_payload_url)\n", "print(model_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Machine Learning model details\n", "\n", "Inference Recommender uses information about your ML model to recommend the best instance types and endpoint configurations for deployment. You can provide as much or as little information as you'd like and Inference Recommender will use that to provide recommendations.\n", "\n", "Example ML Domains: `COMPUTER_VISION`, `NATURAL_LANGUAGE_PROCESSING`, `MACHINE_LEARNING`\n", "\n", "Example ML Tasks: `CLASSIFICATION`, `REGRESSION`, `OBJECT_DETECTION`, `OTHER`\n", "\n", "Note: Select the task that is the closest match to your model. Chose `OTHER` if none apply.\n", "\n", "Example Model name: `resnet50`, `yolov4`, `xgboost` etc\n", "\n", "Use list_model_metadata API to fetch the list of available models. This will help you to pick the closest model for better recommendation." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import pandas as pd\n", "\n", "client = boto3.client(\"sagemaker\", region)\n", "\n", "list_model_metadata_response = client.list_model_metadata()\n", "\n", "domains = []\n", "frameworks = []\n", "framework_versions = []\n", "tasks = []\n", "models = []\n", "\n", "for model_summary in list_model_metadata_response[\"ModelMetadataSummaries\"]:\n", " domains.append(model_summary[\"Domain\"])\n", " tasks.append(model_summary[\"Task\"])\n", " models.append(model_summary[\"Model\"])\n", " frameworks.append(model_summary[\"Framework\"])\n", " framework_versions.append(model_summary[\"FrameworkVersion\"])\n", "\n", "data = {\n", " \"Domain\": domains,\n", " \"Task\": tasks,\n", " \"Framework\": frameworks,\n", " \"FrameworkVersion\": framework_versions,\n", " \"Model\": models,\n", "}\n", "\n", "df = pd.DataFrame(data)\n", "\n", "pd.set_option(\"display.max_rows\", None)\n", "pd.set_option(\"display.max_columns\", None)\n", "pd.set_option(\"display.width\", 1000)\n", "pd.set_option(\"display.colheader_justify\", \"center\")\n", "pd.set_option(\"display.precision\", 3)\n", "\n", "\n", "display(df.sort_values(by=[\"Domain\", \"Task\", \"Framework\", \"FrameworkVersion\"]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example, as we are predicting Sentiment analysis with `HuggingFace` `BERT`, we select `NATURAL_LANGUAGE_PROCESSING` as the Domain, `FILL_MASK` as the Task, `PYTORCH` as the Framework, and `bert-base-uncased` as the Model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ml_domain = \"NATURAL_LANGUAGE_PROCESSING\"\n", "ml_task = \"FILL_MASK\"\n", "ml_framework = \"PYTORCH\"\n", "framework_version = \"1.6.0\"\n", "model = \"bert-base-uncased\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Container image URL\n", "\n", "If you don’t have an inference container image, you can use [Prebuilt SageMaker Docker Images for Deep Learning](https://docs.aws.amazon.com/sagemaker/latest/dg/pre-built-containers-frameworks-deep-learning.html) provided by AWS to serve your ML model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker import image_uris\n", "\n", "# ML model details\n", "model_name = \"huggingface-pytorch-\" + datetime.datetime.now().strftime(\"%Y-%m-%d-%H-%M-%S\")\n", "\n", "inference_image = image_uris.retrieve(\n", " framework=\"pytorch\",\n", " region=region,\n", " version=\"1.7.1\",\n", " py_version=\"py3\",\n", " instance_type=\"ml.m5.large\",\n", " image_scope=\"inference\",\n", ")\n", "\n", "print(inference_image)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Register Model Version/Package\n", "\n", "Inference Recommender expects the model to be packaged in the model registry. Here, we are creating a model package group and a model package version. The model package version which takes container, model `URL` etc. will now allow you to pass additional information about the model like `Domain`, `Task`, `Framework`, `FrameworkVersion`, `NearestModelName`, `SamplePayloadUrl`\n", "You specify a list of the instance types that are used to generate inferences in real-time in`SupportedRealtimeInferenceInstanceTypes` parameter. This list of instance types is key for the inference recommender feature. For inference on tabular data, e.g. with `scikit-learn`, or `XGBoost` models you'll probably want to use standard instances or compute optimized ones. For deep learning models, you will probably want to use accelerated computing instances (GPU).\n", "\n", "As `SamplePayloadUrl` and `SupportedContentTypes` parameters are essential for benchmarking the endpoint, we also highly recommend that you specify `Domain`, `Task`, `Framework`, `FrameworkVersion`, `NearestModelName` for better inference recommendation.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "\n", "client = boto3.client(\"sagemaker\", region)\n", "\n", "model_package_group_name = \"huggingface-pytorch-\" + str(round(time.time()))\n", "print(model_package_group_name)\n", "model_pacakge_group_response = client.create_model_package_group(\n", " ModelPackageGroupName=str(model_package_group_name),\n", " ModelPackageGroupDescription=\"My sample HuggingFace PyTorch model package group\",\n", ")\n", "\n", "print(model_pacakge_group_response)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_package_version_response = client.create_model_package(\n", " ModelPackageGroupName=str(model_package_group_name),\n", " ModelPackageDescription=\"HuggingFace PyTorch Inference Recommender Demo\",\n", " Domain=ml_domain,\n", " Task=ml_task,\n", " SamplePayloadUrl=sample_payload_url,\n", " InferenceSpecification={\n", " \"Containers\": [\n", " {\n", " \"ContainerHostname\": \"huggingface-pytorch\",\n", " \"Image\": inference_image,\n", " \"ModelDataUrl\": model_url,\n", " \"Framework\": ml_framework,\n", " \"NearestModelName\": model,\n", " \"Environment\": {\n", " \"SAGEMAKER_CONTAINER_LOG_LEVEL\": \"20\",\n", " \"SAGEMAKER_PROGRAM\": \"inference.py\",\n", " \"SAGEMAKER_REGION\": region,\n", " \"SAGEMAKER_SUBMIT_DIRECTORY\": model_url,\n", " },\n", " },\n", " ],\n", " \"SupportedRealtimeInferenceInstanceTypes\": [\n", " \"ml.c5.large\",\n", " \"ml.c5.xlarge\",\n", " \"ml.c5.2xlarge\",\n", " \"ml.m5.xlarge\",\n", " \"ml.m5.2xlarge\",\n", " ],\n", " \"SupportedContentTypes\": [\"text/csv\"],\n", " \"SupportedResponseMIMETypes\": [\"text/csv\"],\n", " },\n", ")\n", "\n", "print(model_package_version_response)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Alternative Option: ContainerConfig\n", "\n", "If you are missing mandatory fields to create an inference recommender job in your model package version like so (this `create_model_package` is missing `Domain`, `Task`, and `SamplePayloadUrl`):\n", "\n", "```\n", "client.create_model_package(\n", " ModelPackageGroupName=str(model_package_group_name),\n", " ModelPackageDescription=\"HuggingFace PyTorch Inference Recommender Demo\",\n", " InferenceSpecification={\n", " \"Containers\": [\n", " {\n", " \"ContainerHostname\": \"huggingface-pytorch\",\n", " \"Image\": inference_image,\n", " \"ModelDataUrl\": model_url,\n", " \"Framework\": ml_framework,\n", " \"NearestModelName\": model,\n", " \"Environment\": {\n", " \"SAGEMAKER_CONTAINER_LOG_LEVEL\": \"20\",\n", " \"SAGEMAKER_PROGRAM\": \"inference.py\",\n", " \"SAGEMAKER_REGION\": region,\n", " \"SAGEMAKER_SUBMIT_DIRECTORY\": model_url,\n", " },\n", " },\n", " ],\n", " \"SupportedRealtimeInferenceInstanceTypes\": [\n", " \"ml.c5.large\",\n", " \"ml.c5.xlarge\",\n", " \"ml.c5.2xlarge\",\n", " \"ml.m5.xlarge\",\n", " \"ml.m5.2xlarge\",\n", " ],\n", " \"SupportedContentTypes\": [\"text/csv\"],\n", " \"SupportedResponseMIMETypes\": [\"text/csv\"],\n", " },\n", ")\n", "```\n", "\n", "You may define the fields `Domain`, `Task`, and `SamplePayloadUrl` in the optional field `ContainerConfig` like so:\n", "\n", "```\n", "payload_config = {\n", " \"SamplePayloadUrl\": sample_payload_url,\n", "}\n", "\n", "container_config = {\n", " \"Domain\": ml_domain,\n", " \"Task\": ml_task,\n", " \"PayloadConfig\": payload_config,\n", "}\n", "```\n", "\n", "And then provide it directly within `create_inference_recommendations_job()` API like so:\n", "\n", "```\n", "default_response = client.create_inference_recommendations_job(\n", " JobName=str(default_job),\n", " JobDescription=\"\",\n", " JobType=\"Default\",\n", " RoleArn=role,\n", " InputConfig={\n", " \"ModelPackageVersionArn\": model_package_arn,\n", " \"ContainerConfig\": container_config\n", " },\n", ")\n", "```\n", "\n", "For more information on what else can be provided via `ContainerConfig` please refer to the `CreateInferenceRecommendationsJob` doc here: [CreateInferenceRecommendationsJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateInferenceRecommendationsJob.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5: Create a SageMaker Inference Recommender Default Job\n", "\n", "Now with your model in Model Registry, you can kick off a 'Default' job to get instance recommendations. This only requires your `ModelPackageVersionArn` and comes back with recommendations within an hour. \n", "\n", "The output is a list of instance type recommendations with associated environment variables, cost, throughput and latency metrics." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "from sagemaker import get_execution_role\n", "\n", "client = boto3.client(\"sagemaker\", region)\n", "\n", "role = get_execution_role()\n", "default_job = \"huggingface-pytorch-basic-recommender-job-\" + datetime.datetime.now().strftime(\n", " \"%Y-%m-%d-%H-%M-%S\"\n", ")\n", "default_response = client.create_inference_recommendations_job(\n", " JobName=str(default_job),\n", " JobDescription=\"HuggingFace PyTorch Inference Basic Recommender Job\",\n", " JobType=\"Default\",\n", " RoleArn=role,\n", " InputConfig={\"ModelPackageVersionArn\": model_package_version_response[\"ModelPackageArn\"]},\n", ")\n", "\n", "print(default_response)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 6. Instance Recommendation Results\n", "\n", "The inference recommender job provides multiple endpoint recommendations in its result. The recommendation includes `InstanceType`, `InitialInstanceCount`, `EnvironmentParameters` which includes tuned parameters for better performance. We also include the benchmarking results like `MaxInvocations`, `ModelLatency`, `CostPerHour` and `CostPerInference` for deeper analysis. The information provided will help you narrow down to a specific endpoint configuration that suits your use case.\n", "\n", "Example: \n", "\n", "If your motivation is overall price-performance, then you should focus on `CostPerInference` metrics \n", "If your motivation is latency/throughput, then you should focus on `ModelLatency` / `MaxInvocations` metrics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Running the Inference recommender job will take ~35 minutes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time\n", "\n", "import boto3\n", "import pprint\n", "import pandas as pd\n", "\n", "client = boto3.client(\"sagemaker\", region)\n", "\n", "ended = False\n", "while not ended:\n", " inference_recommender_job = client.describe_inference_recommendations_job(\n", " JobName=str(default_job)\n", " )\n", " if inference_recommender_job[\"Status\"] in [\"COMPLETED\", \"STOPPED\", \"FAILED\"]:\n", " ended = True\n", " else:\n", " print(\"Inference recommender job in progress\")\n", " time.sleep(60)\n", "\n", "if inference_recommender_job[\"Status\"] == \"FAILED\":\n", " print(\"Inference recommender job failed \")\n", " print(\"Failed Reason: {}\".inference_recommender_job[\"FailedReason\"])\n", "else:\n", " print(\"Inference recommender job completed\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Detailing out the result" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = [\n", " {**x[\"EndpointConfiguration\"], **x[\"ModelConfiguration\"], **x[\"Metrics\"]}\n", " for x in inference_recommender_job[\"InferenceRecommendations\"]\n", "]\n", "df = pd.DataFrame(data)\n", "dropFilter = df.filter([\"VariantName\"])\n", "df.drop(dropFilter, inplace=True, axis=1)\n", "pd.set_option(\"max_colwidth\", 400)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's sort the result `dataframe` by `MaxInvocations` - The maximum number of requests per minute expected for the endpoint, in descending order." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.sort_values(by=[\"MaxInvocations\"], ascending=False).head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This time, let's sort the result `dataframe` by `ModelLatencyThresholds` - The interval of time taken by a model to respond as viewed from SageMaker. The interval includes the local communication time taken to send the request and to fetch the response from the container of a model and the time taken to complete the inference in the container." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.sort_values(by=[\"ModelLatency\"]).head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's choose the instance with the lowest `ModelLatency`. This is done by choosing the first record of the result `dataframe`, sorted by ascending order." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "instance_type = (\n", " df.sort_values(by=[\"ModelLatency\"]).head(1)[\"InstanceType\"].to_string(index=False).strip()\n", ")\n", "instance_type" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Optional: ListInferenceRecommendationsJobSteps\n", "To see the list of subtasks for an Inference Recommender job, simply provide the `JobName` to the `ListInferenceRecommendationsJobSteps` API. \n", "\n", "To see more information for the API, please refer to the doc here: [ListInferenceRecommendationsJobSteps](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ListInferenceRecommendationsJobSteps.html)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list_job_steps_response = client.list_inference_recommendations_job_steps(JobName=str(default_job))\n", "print(list_job_steps_response)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7. Create an Endpoint for lowest latency real-time inference\n", "\n", "Next we will create a SageMaker real-time endpoint using the instance with the lowest latency for the model, detected in the Inference Recommender Default Job that was run previously." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_package_arn = model_package_version_response[\"ModelPackageArn\"]\n", "print(\"ModelPackage Version ARN : {}\".format(model_package_arn))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View Model Groups and Versions\n", "\n", "You can view details of a specific model version by using either the AWS SDK for Python (Boto3) or by using Amazon SageMaker Studio.\n", "To view the details of a model version by using Boto3, Call the `list_model_packages` method to view the model versions in a model group" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list_model_packages_response = client.list_model_packages(\n", " ModelPackageGroupName=model_package_group_name\n", ")\n", "list_model_packages_response" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_version_arn = list_model_packages_response[\"ModelPackageSummaryList\"][0][\"ModelPackageArn\"]\n", "print(model_version_arn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View Model Version Details\n", "\n", "Call `describe_model_package` to see the details of the model version. You pass in the ARN of a model version that you got in the output of the call to list_model_packages." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "client.describe_model_package(ModelPackageName=model_version_arn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Update Model Approval Status\n", "\n", "After you create a model version, you typically want to evaluate its performance before you deploy it to a production endpoint. If it performs to your requirements, you can update the approval status of the model version to `Approved`. Setting the status to `Approved` can initiate CI/CD deployment for the model. If the model version does not perform to your requirements, you can update the approval status to `Rejected`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_package_update_input_dict = {\n", " \"ModelPackageArn\": model_package_arn,\n", " \"ModelApprovalStatus\": \"Approved\",\n", "}\n", "model_package_update_response = client.update_model_package(**model_package_update_input_dict)\n", "model_package_update_response" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Deploy the Model in the Registry\n", "\n", "After you register a model version and approve it for deployment, deploy it to a SageMaker endpoint for real-time inference.\n", "\n", "When you create a `MLOps` project and choose a `MLOps` project template that includes model deployment, approved model versions in the model registry are automatically deployed to production. For information about using SageMaker `MLOps` projects, see [Automate `MLOps` with SageMaker Projects](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects.html).\n", "\n", "To deploy a model version using the AWS SDK for Python (Boto3) we'll create a model object from the model version by calling the [create_model](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model) method. Pass the Amazon Resource Name (ARN) of the model version as part of the Containers for the model object.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_name = \"huggingface-pytorch-model-\" + datetime.datetime.now().strftime(\"%Y-%m-%d-%H-%M-%S\")\n", "print(\"Model name : {}\".format(model_name))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "primary_container = {\n", " \"ModelPackageName\": model_version_arn,\n", "}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "create_model_respose = client.create_model(\n", " ModelName=model_name, ExecutionRoleArn=get_execution_role(), PrimaryContainer=primary_container\n", ")\n", "\n", "print(\"Model arn : {}\".format(create_model_respose[\"ModelArn\"]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create an Endpoint Config from the model\n", "\n", "This will create an endpoint configuration that Amazon SageMaker hosting services uses to deploy models. In the configuration, you identify one or more models, created using the `CreateModel` API, to deploy and the resources that you want Amazon SageMaker to provision. Then you call the `CreateEndpoint` API.\n", "\n", "More info on `create_endpoint_config` can be found on the [Boto3 SageMaker documentation page](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint_config)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "endpoint_config_name = \"huggingface-pytorch-endpoint-config-\" + datetime.datetime.now().strftime(\n", " \"%Y-%m-%d-%H-%M-%S\"\n", ")\n", "\n", "endpoint_config_response = client.create_endpoint_config(\n", " EndpointConfigName=endpoint_config_name,\n", " ProductionVariants=[\n", " {\n", " \"VariantName\": \"AllTrafficVariant\",\n", " \"ModelName\": model_name,\n", " \"InitialInstanceCount\": 1,\n", " \"InstanceType\": instance_type,\n", " \"InitialVariantWeight\": 1,\n", " },\n", " ],\n", ")\n", "\n", "endpoint_config_response" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Deploy the Endpoint Config to a real-time endpoint\n", "\n", "This will create an endpoint using the endpoint configuration specified in the request. Amazon SageMaker uses the endpoint to provision resources and deploy models. Note that you have already created the endpoint configuration with the `CreateEndpointConfig` API in the previous step.\n", "\n", "More info on `create_endpoint` can be found on the [Boto3 SageMaker documentation page](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint).\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "endpoint_name = \"huggingface-pytorch-endpoint-\" + datetime.datetime.now().strftime(\n", " \"%Y-%m-%d-%H-%M-%S\"\n", ")\n", "\n", "create_endpoint_response = client.create_endpoint(\n", " EndpointName=endpoint_name,\n", " EndpointConfigName=endpoint_config_name,\n", ")\n", "\n", "create_endpoint_response" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Wait for Endpoint to be ready" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time\n", "\n", "describe_endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)\n", "\n", "while describe_endpoint_response[\"EndpointStatus\"] == \"Creating\":\n", " describe_endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)\n", " print(describe_endpoint_response[\"EndpointStatus\"])\n", " time.sleep(15)\n", "\n", "describe_endpoint_response" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Invoke Endpoint with `boto3`\n", "\n", "After you deploy a model into production using Amazon SageMaker hosting services, your client applications use this API to get inferences from the model hosted at the specified endpoint.\n", "\n", "For an overview of Amazon SageMaker, [see How It Works](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works.html).\n", "\n", "Amazon SageMaker strips all POST headers except those supported by the API. Amazon SageMaker might add additional headers. You should not rely on the behavior of headers outside those enumerated in the request syntax.\n", "\n", "Calls to `InvokeEndpoint` are authenticated by using AWS Signature Version 4. For information, see Authenticating Requests (AWS Signature Version 4) in the Amazon S3 API Reference.\n", "\n", "A customer's model containers must respond to requests within 60 seconds. The model itself can have a maximum processing time of 60 seconds before responding to invocations. If your model is going to take 50-60 seconds of processing time, the SDK socket timeout should be set to be 70 seconds.\n", "\n", "More info on `invoke_endpoint` can be found on the [Boto3 `SageMakerRuntime` documentation page](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-runtime.html#SageMakerRuntime.Client.invoke_endpoint)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test_data = pd.read_csv(\"./sample-payload/test_data.csv\", header=None)\n", "test_data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "runtime = boto3.client(\"sagemaker-runtime\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "response = runtime.invoke_endpoint(\n", " EndpointName=endpoint_name,\n", " Body=test_data.to_csv(header=False, index=False),\n", " ContentType=\"text/csv\",\n", ")\n", "\n", "print(response[\"Body\"].read())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8. Clean up\n", "\n", "Endpoints should be deleted when no longer in use, since (per the [SageMaker pricing page](https://aws.amazon.com/sagemaker/pricing/)) they're billed by time deployed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "client.delete_endpoint(EndpointName=endpoint_name)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "## 9. Conclusion\n", "\n", "In this notebook you successfully downloaded a `Huggingface` pre-trained `sentiment-analysis` model, you compressed the `model` and the payload and upload it to Amazon S3. \n", "Then you registered the Model Version, and triggered a SageMaker Inference Recommender Default Job.\n", "\n", "You then browsed the results, sorted by `MaxInvocations` and by `ModelLatency`, and decided to create an Endpoint for the lowest latency real-time inference.\n", "After deploying the model to a real-time endpoint, you invoked the Endpoint with a sample payload of few sentences, using `boto3`, and got the predictions result.\n", "\n", "As next steps, you can try running SageMaker Inference Recommender on your own models, to select an instance with the best price performance for your needs." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Notebook CI Test Results\n", "\n", "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n", "\n", "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-1/sagemaker-inference-recommender|huggingface-inference-recommender|huggingface-inference-recommender.ipynb)\n", "\n", "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-2/sagemaker-inference-recommender|huggingface-inference-recommender|huggingface-inference-recommender.ipynb)\n", "\n", "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-1/sagemaker-inference-recommender|huggingface-inference-recommender|huggingface-inference-recommender.ipynb)\n", "\n", "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ca-central-1/sagemaker-inference-recommender|huggingface-inference-recommender|huggingface-inference-recommender.ipynb)\n", "\n", "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/sa-east-1/sagemaker-inference-recommender|huggingface-inference-recommender|huggingface-inference-recommender.ipynb)\n", "\n", "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-1/sagemaker-inference-recommender|huggingface-inference-recommender|huggingface-inference-recommender.ipynb)\n", "\n", "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-2/sagemaker-inference-recommender|huggingface-inference-recommender|huggingface-inference-recommender.ipynb)\n", "\n", "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-3/sagemaker-inference-recommender|huggingface-inference-recommender|huggingface-inference-recommender.ipynb)\n", "\n", "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-central-1/sagemaker-inference-recommender|huggingface-inference-recommender|huggingface-inference-recommender.ipynb)\n", "\n", "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-north-1/sagemaker-inference-recommender|huggingface-inference-recommender|huggingface-inference-recommender.ipynb)\n", "\n", "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-1/sagemaker-inference-recommender|huggingface-inference-recommender|huggingface-inference-recommender.ipynb)\n", "\n", "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-2/sagemaker-inference-recommender|huggingface-inference-recommender|huggingface-inference-recommender.ipynb)\n", "\n", "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-1/sagemaker-inference-recommender|huggingface-inference-recommender|huggingface-inference-recommender.ipynb)\n", "\n", "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-2/sagemaker-inference-recommender|huggingface-inference-recommender|huggingface-inference-recommender.ipynb)\n", "\n", "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-south-1/sagemaker-inference-recommender|huggingface-inference-recommender|huggingface-inference-recommender.ipynb)\n" ] } ], "metadata": { "availableInstances": [ { "_defaultOrder": 0, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.t3.medium", "vcpuNum": 2 }, { "_defaultOrder": 1, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.t3.large", "vcpuNum": 2 }, { "_defaultOrder": 2, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.t3.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 3, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.t3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 4, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5.large", "vcpuNum": 2 }, { "_defaultOrder": 5, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 6, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 7, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 8, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 9, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 10, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 11, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 12, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5d.large", "vcpuNum": 2 }, { "_defaultOrder": 13, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5d.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 14, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5d.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 15, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5d.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 16, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5d.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 17, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5d.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 18, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5d.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 19, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 20, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": true, "memoryGiB": 0, "name": "ml.geospatial.interactive", "supportedImageNames": [ "sagemaker-geospatial-v1-0" ], "vcpuNum": 0 }, { "_defaultOrder": 21, "_isFastLaunch": true, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.c5.large", "vcpuNum": 2 }, { "_defaultOrder": 22, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.c5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 23, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.c5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 24, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.c5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 25, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 72, "name": "ml.c5.9xlarge", "vcpuNum": 36 }, { "_defaultOrder": 26, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 96, "name": "ml.c5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 27, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 144, "name": "ml.c5.18xlarge", "vcpuNum": 72 }, { "_defaultOrder": 28, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.c5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 29, "_isFastLaunch": true, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g4dn.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 30, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g4dn.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 31, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g4dn.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 32, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g4dn.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 33, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g4dn.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 34, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g4dn.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 35, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 61, "name": "ml.p3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 36, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 244, "name": "ml.p3.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 37, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 488, "name": "ml.p3.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 38, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.p3dn.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 39, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.r5.large", "vcpuNum": 2 }, { "_defaultOrder": 40, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.r5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 41, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.r5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 42, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.r5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 43, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.r5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 44, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.r5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 45, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 512, "name": "ml.r5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 46, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.r5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 47, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 48, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 49, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 50, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 51, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 52, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 53, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.g5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 54, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.g5.48xlarge", "vcpuNum": 192 }, { "_defaultOrder": 55, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 56, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4de.24xlarge", "vcpuNum": 96 } ], "kernelspec": { "display_name": "Python 3 (PyTorch 1.12 Python 3.8 CPU Optimized)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/pytorch-1.12-cpu-py38" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.16" } }, "nbformat": 4, "nbformat_minor": 4 }

archived/huggingface-inference-recommender/huggingface-inference-recommender.ipynb (1,749 lines of code) (raw):